Biological database

Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.^[1] Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.

Relational database concepts of computer science and Information retrieval concepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of bioinformatics.^[2] Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi-structured data, and can be represented as tables, key delimited records, and XML structures. Cross-references among databases are common, using database accession numbers.

1 Overview
2 Output
3 Problems associated with protein databases
4 Species-specific databases
5 See also
6 References
7 External links

Overview

Biological databases are an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life.

Biological knowledge is distributed amongst many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Biological databases cross-reference other databases with accession numbers as one way of linking their related knowledge together.

An important resource for finding biological databases is a special yearly issue of the journal Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes many of the publicly available online databases related to biology and bioinformatics.

Output

Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example:

Text formats are provided by PubMed and OMIM.
Sequence data are provide by GenBank, in terms of DNA, and UniProt, in terms of protein.
Protein structures are provided by PDB, SCOP, and CATH.

Problems associated with protein databases

Since discovery in the area of protein structure has not evolved quite as quickly as discoveries in the area sequence data, due to the 3D nature of protein structure, less information is available for it. Nonetheless, data can be accessed through members of the wwPDB (PDBe, PDBj and RCSB PDB, SCOP-Structural Classification of Proteins- at ([1]), and CATH at ([2]).

Species-specific databases

Species-specific databases are available for some species, mainly those that are often used in research. For example, Colibase ([3]) is an E. coli database. Other popular species specific databases include, Flybase ([4]) for Drosophila, and WormBase ([5]) for the nematodes Caenorhabditis elegans and Caenorhabditis briggsae.

References

^ Altman RB (March 2004). "Building successful biological databases". Brief. Bioinformatics 5 (1): 4–5. doi:10.1093/bib/5.1.4. PMID 15153301. http://bib.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15153301.
^ Bourne P (August 2005). "Will a biological database be different from a biological journal?". PLoS Comput. Biol. 1 (3): 179–81. doi:10.1371/journal.pcbi.0010034. PMC 1193993. PMID 16158097. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=1193993.

http://www.avatar.se/molbioinfo2001/databases.html

External links

Wiki of biological databases
Interactive list of biological databases, classified by categories, from Nucleic Acids Research, 2010
Genome Proteome Search Engine to search across biological databases
DBD: Database of Biological Databases
CAMERA Cyberinfrastructure for Metagenomics, free data repository and bioinformatics tools for metagenomics.